0. Introduction
Here’s the plan.
If audio can be vectorized, then we can do distance search by similarity.
The trick is the chunking stage. Instead of embedding the whole song, user’s audio file is chunked into N. For each N, K many top results are returned.
Return top 10 similar songs using score fusion (voting + distance); Recommendation complete
Development Stage
Data Collection:
Open source audio dataset
Chunk Processing:
Chunk all audio files into clip of 5-10second audio chunks.
Each chunk → embedding via Wav2Vec2
[optional] Store embedding as dictionaries {‘file_name’,‘chunk_id’,‘embedding’}
[optional] Store chunked audio in database {‘chunk_id’, audio_fileX.wav}
Inference Stage
Chunk Processing:
FAISS Search:
Score Fusion:
Distance → Similarity: 1/(1 + distance)
Voting: Count of chunks matching same file
Combined Score: (0.6 * total_similarity) + (0.4 * votes)
1. Implementation details
Experiments ran using: - Standard_NC6s_v3 (6 cores, 112 GB RAM, 736 GB disk) Azure ML studio
1.1 Webapp (in development ⏳)
frontend: React application ⏳
backend: FastAPI/Flask microservice ⏳
embedding/vector search services: REST API via Databricks Model Serving (GPU) ✅
CI/CD: GitHub actions ✅
2. CLAP musical embedding model
Development stage (embedding)
compute time: 48 hours+ to embed 13171 classical music audio files. on Standard_NC6s_v3 (6 cores, 112 GB RAM, 736 GB disk) GPU
database: Azure Blob Storage to store 10 second clips and whole songs separately
vector database: list of dictionaries {‘chunk_id’, ‘embedding’, ‘filename’} pickled and stored in Databricks Unity Catalog
2.1 Results
Below are three tests.
Testing an audio file that already exists in the database. So the top recommendation should be the exact same file.
Testing an audio file that doesn’t exists in the database.
Testing an audio file that is not classical music.
How to interpret
Rank & File Name
Combined Score
Votes
Total Similarity
Audio Players
Input Sample: Your original audio chunk being compared
Recommended Match: Database clip matched to your input
Distance: 0 = identical, <0.2 = very similar, >0.5 = less similar
2.1.1 Test 1 - successfully retrieves the original file
# %% [Execution]
if __name__ == "__main__" :
logging.info("Starting analysis pipeline" )
logging.info("TEST 1 - MUSIC ALREADY IN DATABASE" )
analyze_audio('../data/inputs/FilePMLP1020413-C.Mantione II. Nausicaa.mp3' , max_recommendations= 2 , max_similar_clips= 2 ,max_input_comparison= 2 )
logging.info("Analysis completed successfully" )
Recommendations
Rank 1: FilePMLP1020413-C.Mantione II. Nausicaa.mp3
Combined Score: 1.0000
Votes: 92 | Total Similarity: 81.41
Input Sample 1:
Your browser does not support the audio element.
Match 1: Distance 0.0000
Your browser does not support the audio element.
Input Sample 2:
Your browser does not support the audio element.
Match 1: Distance 0.0000
Your browser does not support the audio element.
Match 2: Distance 0.1209
Your browser does not support the audio element.
Rank 2: FilePMLP22468-01.01. A Faust Symphony- I - Faust.mp3
Combined Score: 0.0525
Votes: 5 | Total Similarity: 4.23
Input Sample 1:
Your browser does not support the audio element.
Match 1: Distance 0.1406
Your browser does not support the audio element.
Input Sample 2:
Your browser does not support the audio element.
Match 1: Distance 0.1455
Your browser does not support the audio element.
2.1.2 Test 2 - successfully retrieves similar songs
# %% [Execution]
if __name__ == "__main__" :
logging.info("Starting analysis pipeline" )
logging.info("TEST 2 - MUSIC NOT IN DATABASE" )
analyze_audio('../data/inputs/Joe Hisaishi - Summer (High Quality).mp3' , max_recommendations= 2 , max_similar_clips= 2 ,max_input_comparison= 2 )
logging.info("Analysis completed successfully" )
Recommendations
Rank 1: FilePMLP1327315-synthetic.mp3
Combined Score: 1.0000
Votes: 16 | Total Similarity: 14.80
Input Sample 1:
Your browser does not support the audio element.
Match 1: Distance 0.0537
Your browser does not support the audio element.
Match 2: Distance 0.0843
Your browser does not support the audio element.
Input Sample 2:
Your browser does not support the audio element.
Match 1: Distance 0.0542
Your browser does not support the audio element.
Match 2: Distance 0.0660
Your browser does not support the audio element.
Rank 2: FilePMLP06593-Variations sur un air national allemand B.14.mp3
Combined Score: 0.4373
Votes: 7 | Total Similarity: 6.47
Input Sample 1:
Your browser does not support the audio element.
Match 1: Distance 0.0685